What is claimed is: 



1. A method for factoring an input finite-state transducer (FST) including an 
unknown symbol, comprising the steps of: 

replacing each occurrence of the unknown symbol in the input FST with the 
unknown symbol and a diacritic to define a left-sequential finite-state transducer 
(FST); and 

replacing each occurrence of the diacritic with a symbol representative of an 
empty string and an output symbol to define a right-sequential finite-state transducer 
(FST); 

wherein said replacing steps avoid direct factorization of the unknown symbol. 

2. The method of claim 1, further comprising the step of factoring the 
unknown symbol in the input FST into arc label sequences f ?, S:1^lr and [Af.e, 
?: cf°"'\l, where: 

Xi is a diacritic, 

(t""^ is an output symbol, and 

(5 is a deterministic empty string. 

3. The method of claim 2, further comprising the step of copying the arc label 
sequence f?, S:X^lr to the left-sequential FST. 



4. The method of claim 2, further comprising the step of copying the arc label 
sequence IXf.e, to the right-sequential FST. 

5. The method of claim 1, wherein the left-sequential FST and the right- 
sequential FST are adapted for performing language processing. 

6. The method of claim 5, wherein the language processing comprises one of 
tokenization, phonological analysis, morphological analysis, disambiguation, spelling 
correction, and shallow parsing. 

7. The method of claim 1, wherein the left-sequential FST and the right- 
sequential FST are lexical transducers. 
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8. An apparatus for factoring an input finite-state transducer (FST) including 
an unknown symbol, comprising: 

means for replacing each occurrence of the unknown symbol in the input FST 
with the unknown symbol and a diacritic to define a left-sequential finite-state 
5 transducer (FST); and 

means for replacing each occurrence of the diacritic with a symbol 
representative of an empty string and an output symbol to define a right-sequential 
finite-state transducer (FST); 

wherein said replacing means avoid direct factorization of the unknown 
10 symbol. 

9. The apparatus of claim 8, further comprising means for factoring the 
unknown symbol in the input FST into arc label sequences [?, Sdi]LR and [ir.e, 
^•.CT°''%L, where: 

15 2.1 is a diacritic, 

a^"' is an output symbol, and 
<5 is a deterministic empty string. 

10. The apparatus of claim 9, further comprising means for copying the arc 
20 label sequence [?, SiAi^lr to the left-sequential FST. 

1 1. The apparatus of claim 9, further comprising means for copying the arc 
label sequence [Xr.e, T.(j''"'~\rl to the right-sequential FST. 

25 12. The apparatus of claim 8, wherein the left-sequential FST and the right- 

sequential FST are adapted for performing language processing. 

13. The apparatus of claim 12, wherein the language processing comprises one 
of tokenization, phonological analysis, morphological analysis, disambiguation, 

30 spelling correction, and shallow parsing. 

14. The apparatus of claim 8, wherein the left-sequential FST and the right- 
sequential FST are lexical transducers. 
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